Optimal imperfect phylogeny reconstruction and haplotyping (IPPH).
نویسندگان
چکیده
The production of large quantities of diploid genotype data has created a need for computational methods for large-scale inference of haplotypes from genotypes. One promising approach to the problem has been to infer possible phylogenies explaining the observed genotypes in terms of putative descendants of some common ancestral haplotype. The first attempts at this problem proceeded on the restrictive assumption that observed sequences could be explained by a perfect phylogeny, in which each variant locus is presumed to have mutated exactly once over the sampled population's history. Recently, the perfect phylogeny model was relaxed and the problem of reconstructing an imperfect phylogeny (IPPH) from genotype data was considered. A polynomial time algorithm was developed for the case when a single site is allowed to mutate twice, but the general problem remained open. In this work, we solve the general IPPH problem and show for the first time that it is possible to infer optimal q-near-perfect phylogenies from diploid genotype data in polynomial time for any constant q, where q is the number of "extra" mutations required in the phylogeny beyond what would be present in a perfect phylogeny. This work has application to the haplotype phasing problem as well as to various related problems in phylogenetic inference, analysis of sequence variability in populations, and association study design. Empirical studies on human data of known phase show this method to be competitive with the leading phasing methods and provide strong support for the value of continued research into algorithms for general phylogeny construction from diploid data.
منابع مشابه
Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event
The haplotype inference (HI) problem is the problem of inferring 2n haplotype pairs from n observed genotype vectors. This is a key problem that arises in studying genetic variation in populations, for example in the ongoing HapMap project [5]. In order to have a hope of finding the haplotypes that actually generated the observed genotypes, we must use some (implicit or explicit) genetic model ...
متن کاملInfluence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data
Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. Unfortunately, when data entries are missing, as is often the case in real laboratory data, the resulting formal problem IPPH, which...
متن کاملIncremental Haplotype Inference, Phylogeny, and Almost Bipartite Graphs∗
We address the combinatorial problem of inferring haplotypes in a population that forms a perfect phylogeny (PP) given a sample of genotypes. The problem is relevant because, in DNA sequencing, genotypes are easier to obtain than haplotyping by DNA sequencing. Since PP’s appear naturally and frequently on DNA sequences of restricted length, PP haplotyping is a favourable approach to facilitate ...
متن کاملPerfect Path Phylogeny Haplotyping with Missing Data Is Fixed-Parameter Tractable
Haplotyping via perfect phylogeny is a method for retrieving haplotypes from genotypes. Fast algorithms are known for computing perfect phylogenies from complete and error-free input instances—these instances can be organized as a genotype matrix whose rows are the genotypes and whose columns are the single nucleotide polymorphisms under consideration. Unfortunately, in the more realistic setti...
متن کاملAlgorithms for inferring haplotypes.
Haplotype phase information in diploid organisms provides valuable information on human evolutionary history and may lead to the development of more efficient strategies to identify genetic variants that increase susceptibility to human diseases. Molecular haplotyping methods are labor-intensive, low-throughput, and very costly. Therefore, algorithms based on formal statistical theories were sh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational systems bioinformatics. Computational Systems Bioinformatics Conference
دوره شماره
صفحات -
تاریخ انتشار 2006